node-conditional distribution
28f0b864598a1291557bed248a998d4e-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper extends the work of Yang et al (2012) on undirected graphical models where the node-wise conditional distributions are from the univariate exponential family; in this paper, the authors allow covariates to be included and also allow the distribution type to be node specific. They derive the form of the natural parameter for the response distributions under the assumption that the node-wise conditional distributions of the covariates are also from the univariate exponential family, and prove guarantees on the learnability of the models. Finally, the authors present some limited experimental results on synthetic and real data. The paper is well written, particularly given the density of material.
Graphical Models via Generalized Linear Models
Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models.
On Poisson Graphical Models
Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its infinite domain. In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. We begin by discussing two strategies for truncating the Poisson distribution and show that only one of these leads to a valid joint distribution. While this model can accommodate a wider range of conditional dependencies, some limitations still remain. To address this, we investigate two additional novel variants of the Poisson distribution and their corresponding joint graphical model distributions. Our three novel approaches provide classes of Poisson-like graphical models that can capture both positive and negative conditional dependencies between count-valued variables. One can learn the graph structure of our models via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-sequencing data.
Conditional Random Fields via Univariate Exponential Families
Conditional random fields, which model the distribution of a multivariate response conditioned on a set of covariates using undirected graphs, are widely used in a variety of multivariate prediction applications. Popular instances of this class of models, such as categorical-discrete CRFs, Ising CRFs, and conditional Gaussian based CRFs, are not well suited to the varied types of response variables in many applications, including count-valued responses. We thus introduce a novel subclass of CRFs, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families. This allows us to derive novel multivariate CRFs given any univariate exponential distribution, including the Poisson, negative binomial, and exponential distributions. Also in particular, it addresses the common CRF problem of specifying "feature" functions determining the interactions between response variables and covariates. We develop a class of tractable penalized M-estimators to learn these CRF distributions from data, as well as a unified sparsistency analysis for this general class of CRFs showing exact structure recovery can be achieved with high probability.
A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution
Inouye, David I., Yang, Eunho, Allen, Genevera I., Ravikumar, Pradeep
The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: 1) where the marginal distributions are Poisson, 2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and 3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent discussion section.
On Graphical Models via Univariate Exponential Family Distributions
Yang, Eunho, Ravikumar, Pradeep, Allen, Genevera I., Liu, Zhandong
Undirected graphical models, or Markov networks, are a popular class of statistical models, used in a wide variety of applications. Popular instances of this class include Gaussian graphical models and Ising models. In many settings, however, it might not be clear which subclass of graphical models to use, particularly for non-Gaussian and non-categorical data. In this paper, we consider a general sub-class of graphical models where the node-wise conditional distributions arise from exponential families. This allows us to derive multivariate graphical model distributions from univariate exponential family distributions, such as the Poisson, negative binomial, and exponential distributions. Our key contributions include a class of M-estimators to fit these graphical model distributions; and rigorous statistical analysis showing that these M-estimators recover the true graphical model structure exactly, with high probability. We provide examples of genomic and proteomic networks learned via instances of our class of graphical models derived from Poisson and exponential distributions.
Vector-Space Markov Random Fields via Exponential Families
Tansey, Wesley, Padilla, Oscar Hernan Madrid, Suggala, Arun Sai, Ravikumar, Pradeep
We present Vector-Space Markov Random Fields (VS-MRFs), a novel class of undirected graphical models where each variable can belong to an arbitrary vector space. VS-MRFs generalize a recent line of work on scalar-valued, uni-parameter exponential family and mixed graphical models, thereby greatly broadening the class of exponential families available (e.g., allowing multinomial and Dirichlet distributions). Specifically, VS-MRFs are the joint graphical model distributions where the node-conditional distributions belong to generic exponential families with general vector space domains. We also present a sparsistent $M$-estimator for learning our class of MRFs that recovers the correct set of edges with high probability. We validate our approach via a set of synthetic data experiments as well as a real-world case study of over four million foods from the popular diet tracking app MyFitnessPal. Our results demonstrate that our algorithm performs well empirically and that VS-MRFs are capable of capturing and highlighting interesting structure in complex, real-world data. All code for our algorithm is open source and publicly available.
On Poisson Graphical Models
Yang, Eunho, Ravikumar, Pradeep K., Allen, Genevera I., Liu, Zhandong
Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its infinite domain. In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. We begin by discussing two strategies for truncating the Poisson distribution and show that only one of these leads to a valid joint distribution; even this model, however, has limitations on the types of variables and dependencies that may be modeled. To address this, we propose two novel variants of the Poisson distribution and their corresponding joint graphical model distributions. These models provide a class of Poisson graphical models that can capture both positive and negative conditional dependencies between count-valued variables. One can learn the graph structure of our model via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-Sequencing data.
Graphical Models via Generalized Linear Models
Yang, Eunho, Allen, Genevera, Liu, Zhandong, Ravikumar, Pradeep K.
Undirected graphical models, or Markov networks, such as Gaussian graphical models and Ising models enjoy popularity in a variety of applications. In many settings, however, data may not follow a Gaussian or binomial distribution assumed by these models. We introduce a new class of graphical models based on generalized linear models (GLM) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate networks for a wide class of exponential distributions, such as the Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We provide examples of high-throughput genomic networks learned via our GLM graphical models for multinomial and Poisson distributed data.